Home > On-Demand Archives > Talks >

Convolution: A Practical Review

Ric Losada - MathWorks - Watch Now - DSP Online Conference 2021 - Duration: 01:07:01

This session focuses on the practical aspects of convolution and FIR filtering and some of its applications, from algorithm exploration through to real-time signal processing.

We discuss the tradeoffs between time and frequency domain implementations, including latency and computational cost. For both domains we review the best-known implementation variants and some of the hardware technologies most used for performance acceleration.

This guide was created with the help of AI, based on the presentation's transcript. Its goal is to give you useful context and background so you can get the most out of the session.

What this presentation is about and why it matters

This talk is a practical tour of convolution and finite-impulse-response (FIR) filtering, with emphasis on how to implement long filters efficiently for real-time systems. It compares classical time-domain sample-by-sample implementations (direct and transposed forms), partially-serialized and SIMD-accelerated implementations, and frequency-domain block methods based on the FFT (overlap-add and partitioned convolution). The speaker explains when each approach wins or loses in practice, and why design choices (for example, reducing filter length with IFIR or using polyphase/downsampling) can be as important as low-level optimizations.

Who will benefit the most from this presentation

This presentation is aimed at practicing engineers and students who implement real-time DSP systems: audio and spatial-audio developers, embedded and FPGA/ASIC designers, and anyone responsible for high-performance filtering in software or hardware. If you work on headphone binaural rendering, room acoustics, echo cancellation, or any application where filters are long or latency matters, you will get practical guidance and tradeoffs you can immediately test.

What you need to know

Before watching, a few core concepts will make the talk easier to follow:

  • Convolution basics: Linear convolution for discrete signals is the weighted sum of past inputs by the impulse response. In notation: y[n] = sum_{k=-infty}^{infty} x[k] h[n-k]. With a finite impulse response (length L) the sum is finite and implementable.
  • FIR structure choices: Direct-form (tap-delay line) and transposed structures implement the same transfer function, but differ in how state is stored, transient behavior, pipeline friendliness, and how easily they vectorize. For transient/initial-condition differences the state-space view explains why they can produce different outputs when coefficients change at runtime.
  • Complexity versus latency: Time-domain cost grows roughly as O(L) per output sample (L multiplies), while FFT-based block convolution trades compute for block buffering. For block length M and filter length L, frequency-domain linear convolution via the FFT requires choosing an FFT size N satisfying N \ge L + M - 1. FFT approaches become computationally attractive as L and M grow, but they introduce buffering delay.
  • Optimizations and hardware: SIMD (AVX, NEON, etc.), FMA units, circular buffers, and coarse- and fine-grain parallelism change the practical crossover point between time-domain and frequency-domain methods. Compiler auto-vectorization or hand-tuned intrinsics can make naive time-domain code surprisingly fast.
  • Filter-design level savings: Techniques such as interpolated FIR (IFIR) and polyphase/downsampling can reduce the number of nonzero multiplies needed, often before any implementation-level optimization is attempted. Designing for multirate processing changes where the cost is paid.
  • Partitioned convolution: Long impulse responses can be split into partitions so that short input blocks reduce latency while using FFTs efficiently on each partition. This is essential in audio/binaural applications where perceived latency matters.

Glossary

  • Convolution — The discrete-time operation that blends input samples with an impulse response to produce output samples; the mathematical core of FIR filtering.
  • FIR (Finite Impulse Response) — A nonrecursive filter whose impulse response has finite length L and which is implemented via convolution.
  • Impulse Response (h[n]) — The filter coefficients; the time-domain pattern that defines how an FIR filter responds to an impulse.
  • Direct form — The straightforward tap-delay line FIR implementation where past inputs are multiplied by coefficients and summed.
  • Transposed form — An alternative FIR layout with delays between adders; often more pipeline- and SIMD-friendly and better for coarse-grain parallelism.
  • Overlap-add — An FFT-based block convolution method that breaks input into blocks, convolves each block with the full impulse response in the frequency domain, then overlaps and adds the outputs.
  • Partitioned convolution — Splitting a long impulse response into smaller blocks (partitions) to reduce latency while using FFT acceleration on each partition.
  • IFIR (Interpolated FIR) — A design technique that inserts zeros in a short prototype filter to achieve a narrow transition band and uses a separate image suppressor—reduces nonzero coefficients.
  • Polyphase / Noble identities — Multirate identities that enable efficient combination of downsampling/upsampling and filtering for computational savings.
  • FFT (Fast Fourier Transform) — An algorithm to compute discrete Fourier transforms efficiently; used to turn convolution into pointwise multiplication in frequency-domain methods.

Final note

This is a practical, no-nonsense talk: it pairs clear explanation of classical algorithms with measured benchmarks and real-world considerations (latency, SIMD, compiler behavior, multithreading, and filter design choices). Expect actionable rules of thumb rather than hard dogma: the best method depends on filter length, block size, hardware features, and whether you can redesign the filter. If you implement or optimize FIRs for audio, embedded systems, or spatial-audio, you’ll come away with techniques you can try immediately and a better sense of which trade-offs matter.

M↓ MARKDOWN HELP
italicssurround text with
*asterisks*
boldsurround text with
**two asterisks**
hyperlink
[hyperlink](https://example.com)
or just a bare URL
code
surround text with
`backticks`
strikethroughsurround text with
~~two tilde characters~~
quote
prefix with
>

fred_h
Score: 0 | 4 years ago | no reply

one very effective option... do the 5-to- down sampling in polyphase and then 1-to-5 interpolator to keep same input sample rate but have all operations at reduced sample rate.... always beats the IFIR filter which raises the sample rate and uses wider registers to hold the zero packed data.

fred_h
Score: 0 | 4 years ago | no reply

the longest room acoustic impulse response i ever heard was a cathedral in England... they wanted to show how the echos significantly altered how a modern piece of music would sound in a long echo environment...i think it was a matlab based demo....

Darkphibre
Score: 0 | 4 years ago | no reply

On circular buffers: I believe you can get rid of the if statement with a modulus operator to recenter back to start if the memory is aligned, but ymmv whether branch mispredictions vs. division is cheaper.

Darkphibre
Score: 0 | 4 years ago | no reply

On slide SIMD Implementation: When doing SIMD, doesn't each register still rely on the prior results (the other registers) to finish? Did we shift gears away from the partial-parallel you were talking about for ASICs?